Report post

How are tokens per minute (TPM) calculated?

For billing purposes, tokens per minute (TPM) are calculated by averaging the number of tokens used in 15-minute intervals aligned to the top of the hour (e.g. 3:00 to < 3:15, 3:15 to < 3:30, etc). If the total tokens used within a 15-minute period is below your Scale Tier entitlement, they are not billed.

How many tokens does a request use?

Each request uses approximately 100 tokens. With a TPM quota of 100,000, you can make up to 1,000 requests per minute (100,000 tokens / 100 tokens per request). Despite the high number of requests, the RPM quota limits you to 600 requests per minute [6 RPM for every 1000 TPM].

Should I understand my expected tokens per minute (TPM) usage before migrating workloads?

You should understand your expected Tokens Per Minute (TPM) usage in detail prior to migrating workloads to PTU. Determining the right amount of provisioned throughput, or PTUs, you require for your workload is an essential step to optimizing performance and cost. This section describes how to use the Azure OpenAI capacity planning tool.

The World's Leading Crypto Trading Platform

Get my welcome gifts